Selecting the Best Tridiagonal System Solver Projected on Multi-Core CPU and GPU Platforms

نویسندگان

Pablo Quesada-Barriuso

Julián Lamas-Rodríguez

Dora B. Heras

Montserrat Bóo

Francisco Argüello

چکیده

Nowadays multicore processors and graphics cards are commodity hardware that can be found in personal computers. Both CPU and GPU are capable of performing high-end computations. In this paper we present and compare parallel implementations of two tridiagonal system solvers. We analyze the cyclic reduction method, as an example of fine-grained parallelism, and Bondeli’s algorithm, as a coarse-grained example of parallelism. Both algorithms are implemented for GPU architectures using CUDA and multi-core CPU with shared memory architectures using OpenMP. The results are compared in terms of execution time, speedup, and GFLOPS. For a large system of equations, 2, the best results were obtained for Bondeli’s algorithm (speedup 1.55x and 0.84 GFLOPS) for multi-core CPU platforms while the cyclic reduction (speedup 17.06x and 5.09 GFLOPS) was the best for the case of GPU platforms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Evaluation and Analysis for Conjugate Gradient Solver on Heterogeneous (Multi-GPUs/Multi-CPUs) platforms

High performance computing (HPC) presents a technology that allows solving high intensive problems in a reasonable period of time, and can offer many advantages for large applications in various fields of science and industry. Current multi-core processors, especially graphic processing units (GPUs), have quickly evolved to become efficient accelerators for data parallel computing. They can mai...

متن کامل

GPGPU parallel algorithms for structured-grid CFD codes

A new high-performance general-purpose graphics processing unit (GPGPU) computational fluid dynamics (CFD) library is introduced for use with structured-grid CFD algorithms. A novel set of parallel tridiagonal matrix solvers, implemented in CUDA, is included for use with structured-grid CFD algorithms. The solver library supports both scalar and block-tridiagonal matrices suitable for approxima...

متن کامل

Efficient heterogeneous execution on large multicore and accelerator platforms: Case study using a block tridiagonal solver

The algorithmic and implementation principles are explored in gainfully exploiting GPU accelerators in conjunction with multicore processors on high-end systems with large numbers of compute nodes, and evaluated in an implementation of a scalable block tridiagonal solver. The accelerator of each compute node is exploited in combination with multicore processors of that node in performing block-...

متن کامل

The Comparison of Parallel Sorting Algorithms Implemented on Different Hardware Platforms

Sorting is a common problem in computer science. There are a lot of wellknown sorting algorithms created for sequential execution on a single processor. Recently, many-core and multi-core platforms have enabled the creation of wide parallel algorithms. We have standard processors that consist of multiple cores and hardware accelerators, like the GPU. Graphic cards, with their parallel architect...

متن کامل

A Memory-Efficient Algorithm for Large-Scale Symmetric Tridiagonal Eigenvalue Problem on Multi-GPU Systems

Divide-and-conquer algorithm is a numerically stable and efficient algorithm that computes the eigenvalues and eigenvectors of a symmetric tridiagonal matrix. We often face the situation where the input matrix fits into the main memory but not into the on-chip memory of a GPU device. We present an out-of-core implementation where only part of the input matrix is resident in GPU memory at any po...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Selecting the Best Tridiagonal System Solver Projected on Multi-Core CPU and GPU Platforms

نویسندگان

چکیده

منابع مشابه

Performance Evaluation and Analysis for Conjugate Gradient Solver on Heterogeneous (Multi-GPUs/Multi-CPUs) platforms

GPGPU parallel algorithms for structured-grid CFD codes

Efficient heterogeneous execution on large multicore and accelerator platforms: Case study using a block tridiagonal solver

The Comparison of Parallel Sorting Algorithms Implemented on Different Hardware Platforms

A Memory-Efficient Algorithm for Large-Scale Symmetric Tridiagonal Eigenvalue Problem on Multi-GPU Systems

عنوان ژورنال:

اشتراک گذاری